Arm backend: Make the logic for temp_allocation_pool realistic#20064
Arm backend: Make the logic for temp_allocation_pool realistic#20064gggekov wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20064
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 1 Pending, 3 Unrelated FailuresAs of commit d11044e with merge base c4e3db0 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR updates the Arm bare-metal Ethos-U runner configuration to make temp_allocation_pool sizing and placement align with realistic Corstone SRAM constraints, and simplifies the flow by removing the dynamic “derive scratch size from PTE” option.
Changes:
- Remove the
--specify_ethosu_scratchoption from scripts/tests and stop deriving scratch size from PTEs. - Set more realistic default
ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZEvalues based onSYSTEM_CONFIG+MEMORY_MODE. - Move
ETHOSU_MODEL/ETHOSU_ARENAhandling to the top-level runner CMake so linker script preprocessing gets the correct values.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/arm/run.sh | Removes --specify_ethosu_scratch support and adds extra configure logging. |
| examples/arm/executor_runner/CMakeLists.txt | Adjusts default scratch pool sizing and pre-processes linker scripts with ETHOSU_MODEL/ARENA based on memory mode. |
| backends/arm/test/test_model.py | Removes the PTE-derived scratch sizing option and associated build logic. |
| backends/arm/test/common.py | Updates the default U85 memory mode and extra flags used by tests. |
| backends/arm/scripts/get_ethosu_scratch_from_pte.py | Improves bundled-program detection and adjusts output formatting. |
| backends/arm/scripts/corstone_utils.cmake | Removes ETHOSU_MODEL/ARENA compile definitions from the Corstone helper target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -82,7 +81,6 @@ function help() { | |||
| echo " --config=<FILEPATH> Ethos-U: System configuration file that specifies system configurations (vela.ini)" | |||
| echo " --memory_mode=<MODE> Ethos-U: Memory mode to select from the Vela configuration file (see vela.ini), e.g. Shared_Sram/Sram_Only. Default: 'Shared_Sram' for Ethos-U55 targets, 'Sram_Only' for Ethos-U85 targets" | |||
| # On silicon, for Sram_Only, the model is assumed to be in the SRAM but that | ||
| # limits the number of models we test for Sram_Only. For testing coverage & | ||
| # consistency, we places the model in the external memory for Sram_Only |
| # If no mention of BP08(specifier for bundled program), return pte | ||
| if len(pte_data) < 8 or pte_data[4:8] != b"BP08": | ||
| return pte_data | ||
| # bundled program |
There was a problem hiding this comment.
No longer changing that as I am not using get_ethosu_scratch_from_pte.py
- Move the setting of ETHOSU_MODEL & ETHOSU_ARENA from corstone_utils.cmake to the top-level app CMake so that the linker script is correctly generated. As a result, the .bss.tensor section, used for the temp_allocation_pool, goes into the SRAM instead of the DDR for Shared_Sram. - Use 2MB of temp_allocation_pool for Corstone-300 and 4MB-64KB of temp_allocation_pool for Corstone-320. - Remove the specify_ethosu_scratch option - Change default memory mode for the Ethos-U85 to Dedicated_Sram in order to match how we build the arm_executor_runner binary Signed-off-by: George Gekov <george.gekov@arm.com> Change-Id: I16ecd991d722b665f0faf4b0ec998427a381fed8
90a3b8b to
d11044e
Compare
| if(SYSTEM_CONFIG MATCHES "U55") | ||
| # The Corstone-300 has 2MB of SRAM, provide the Ethos-U with the full 2MB | ||
| # for the temp_allocation_pool array. | ||
| set(ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE 0x200000) | ||
| elseif(SYSTEM_CONFIG MATCHES "U85") | ||
| if(MEMORY_MODE MATCHES "Dedicated_Sram") | ||
| # 32MB of scratch buffer for Dedicated_Sram memory mode. | ||
| set(ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE 0x2000000) | ||
| # For Dedicated_Sram, set the | ||
| # ET_ARM_BAREMETAL_FAST_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE to 384KB unless | ||
| # specified otherwise. | ||
| if(NOT DEFINED ET_ARM_BAREMETAL_FAST_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE) | ||
| set(ET_ARM_BAREMETAL_FAST_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE 0x60000) | ||
| endif() | ||
|
|
||
| elseif(MEMORY_MODE MATCHES "Shared_Sram" OR MEMORY_MODE MATCHES "Sram_Only") | ||
| # For Shared_Sram and Sram only, use scratch buffer of 4MB - 64KB The | ||
| # Corstone-320 provides 4MB of SRAM and we subtract 64KB because we have a | ||
| # few objects placed at the start of the SRAM, before the | ||
| # temp_allocation_pool array. | ||
| set(ET_ARM_BAREMETAL_SCRATCH_TEMP_ALLOCATOR_POOL_SIZE 0x3F0000) | ||
| endif() |
| # For Shared_Sram and Sram only, use scratch buffer of 4MB - 64KB The | ||
| # Corstone-320 provides 4MB of SRAM and we subtract 64KB because we have a | ||
| # few objects placed at the start of the SRAM, before the | ||
| # temp_allocation_pool array. |
arm_executor_runner binary
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani